Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import many files and generate an `id' variable as the file name.

    Dear All, Suppose that I have the following 3 files (in fact, about 4,000 similar files). The file name is 1, 2, 4 with extension xlsx. I wish to import each file, then generate an `id' variable with values all equal to the file name 1, 2, and 4, for all observations in the file, respectively. Finally, I need to -append- them together. Any suggestions?
    Attached Files
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Code:
    cd <wherever you stored your files>
    import excel using 1.xlsx
    gen file_id = 1
    save combined, replace
    foreach i = 2/3 {
        import excel using `i'.xlsx
        gen file_id = `i'
        append using combined
        save combined, replace
    }
    If you need to do more preparation of the excel files, and the block gets very long, you can store the commands for loading and manipulating of one excel in one .do file and pass the file number as an argument to that .do file. Here is a brief discussion of that http://www.maartenbuis.nl/workshops/...l#slide16.smcl
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten, Many thanks for this suggestion.
      1. However, the filenames are NOT consecutive (say, 1,2 4, ..., 300306, ..., 600901, ...).
      2. I will try to see if I can understand the brief discussion that you suggested.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        Assuming that all the excel files in the directory contains are the ones you want to append, and that the names contains only numbers

        Code:
        cd <where ever you stored your files>
        local files  : dir files "*.xls"
        gettoken first files : files
        import excel using `first'
        gettoken num rest: first, parse(".")
        gen file_id = `num'
        save combined, replace
        foreach file of local files {
            import excel using `file'
            gettoken num rest: file, parse(".")
            gen file_id = `num'
           append using combined
           save combined
        }
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Although less flexible, xls2dta (SSC) implements these kinds of loops for you.

          Code:
          *ssc install xls2dta
          cd <whereever you stored your files>
          xls2dta , clear generate(file_id) : append using .
          replace file_id = regexs(1) if regexm(file_id, "([0-9]+)")
          Best
          Daniel

          Comment


          • #6
            Dear Maarten, Thanks again. I will give it a try ASAP.
            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7
              Dear Daniel, I have used your -xls2dta- command from time to time. In this case, I will try it again. Thanks a lot.
              Ho-Chuan (River) Huang
              Stata 17.0, MP(4)

              Comment

              Working...
              X